An optimal cardinality estimation algorithm based on order statistics and its full analysis

نویسنده

  • Jérémie Lumbroso
چکیده

Building on the ideas of Flajolet and Martin (1985), Alon et al. (1987), Bar-Yossef et al. (2002), Giroire (2005), we develop a new algorithm for cardinality estimation, based on order statistics which, according to Chassaing and Gerin (2006), is optimal among similar algorithms. This algorithm has a remarkably simple analysis that allows us to take its fine-tuning and the characterization of its properties further than has been done until now. We prove that, asymptotically, it is strictly unbiased (contrarily to Probabilistic Counting, Loglog, Hyperloglog), we verify that its relative precision is about 1/ √ m− 2 when m words of storage are used, and we fully characterize the limit law of the estimates it provides, in terms of gamma distribution—this is the first such algorithm for which the limit law has been established. We also develop a Poisson analysis for the pre-asymptotic regime. In this way, we are able to devise a complete algorithm, covering all cardinalities ranges from 0 to very large.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Concomitants of Order Statistics from Farlie-Gumbel-Morgenstern Bivariate Lomax Distribution and its Application in Estimation

‎In this paper‎, ‎we have dealt with the distribution theory of concomitants of order statistics arising from Farlie-Gumbel-Morgenstern bivariate Lomax distribution‎. ‎We have discussed the estimation of the parameters associated with the distribution of the variable Y of primary interest‎, ‎based on the ranked set sample defined by ordering the marginal observations...

متن کامل

Three-stage inversion improvement for forest height estimation using dual-PolInSAR data

This paper addresses an algorithm for forest height estimation using single frequency single baseline dual polarization radar interferometry data. The proposed method is based on a physical two layer volume over ground model and is represented using polarimetric synthetic aperture radar interferometry (PolInSAR) technique. The presented algorithm provides the opportunity to take advantages of t...

متن کامل

Optimal Design of FPI^λ D^μ based Stabilizers in Hybrid Multi-Machine Power System Using GWO ‎Algorithm

In this paper, the theory and modeling of large scale photovoltaic (PV) in the power grid and its effect on power system stability are studied. In this work, the basic module, small signal modeling and mathematical analysis of the large scale PV jointed multi-machine are demonstrated. The principal portion of the paper is to reduce the low frequency fluctuations by tuned stabilizer in the atten...

متن کامل

Development of a Pharmacogenomics Model based on Support Vector Regression with Optimal Features Selection Approach to Determine the Initial Therapeutic Dose of Warfarin Anticoagulant Drug

Introduction: Using artificial intelligence tools in pharmacogenomics is one of the latest bioinformatics research fields. One of the most important drugs that determining its initial therapeutic dose is difficult is the anticoagulant warfarin. Warfarin is an oral anticoagulant that, due to its narrow therapeutic window and complex interrelationships of individual factors, the selection of its ...

متن کامل

Development of a Pharmacogenomics Model based on Support Vector Regression with Optimal Features Selection Approach to Determine the Initial Therapeutic Dose of Warfarin Anticoagulant Drug

Introduction: Using artificial intelligence tools in pharmacogenomics is one of the latest bioinformatics research fields. One of the most important drugs that determining its initial therapeutic dose is difficult is the anticoagulant warfarin. Warfarin is an oral anticoagulant that, due to its narrow therapeutic window and complex interrelationships of individual factors, the selection of its ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010